Skip to main content

POLIS

  • Home
  • About
    • Annual report
  • People
    • Director
    • Management committee
    • Staff
    • Adjuncts
    • Visitors
    • Current HDR students
    • Scientific Advisory Board
  • Events
    • CSRM Seminar series
    • Citizen Social series
    • Conferences & workshops
      • Past conferences & workshops
  • News
    • In the media
  • ASPA
    • 2025 Australian Social Policy HDR Conference
    • Australian Journal of Social issues
    • Australian Social Policy Conference
    • Contact us
  • WAPOR
  • Education & training
    • POLIS Courses on offer
    • Undergraduate programs
    • Graduate programs
    • Honours
    • Higher degree by research
    • Executive courses
  • Programs & research
    • Australian Data Archive
    • Criminology
    • Centre for Gambling Research
      • Current projects
      • Past projects & outcomes
      • Media & Resources
    • Research Methods
    • PolicyMod
    • Social Policy
    • Surveys
      • ANUPoll
        • Methodologya
        • Contact ANUpoll
    • Evaluations
    • Transnational Research Institute on Corruption
      • TRIC Award for Anti-Corruption Research
      • The Corruption Agenda
      • Anti-corruption conferences and forums
      • Research
      • Corruption Studies
      • Resources
      • Contact us
    • Research projects
      • Manning cost-benefit tool
      • Routledge Wellbeing Handbook
      • SOAR
      • QRN
      • NT Gambling project
      • FaCtS Study
      • PELab
      • Evaluation of Narragunnawali
      • OxCGRT Australian Subnational dataset
      • Post Separation Parenting Apps
  • Publications
    • Working papers
    • Methods research papers
    • COVID-19 publications
    • Other publications
  • Contact us

Related Sites

  • ANU College of Arts & Social Sciences
  • Research School of Social Sciences
  • Australian National Internships Program
  • ANU Jobs

Administrator

Breadcrumb

HomePublicationsA Universal Global Measure of Univariate and Bivariate Data Utility For Anonymised Microdata
A universal global measure of univariate and bivariate data utility for anonymised microdata
A universal global measure of univariate and bivariate data utility for anonymised microdata
Author/editor: Kocar, Sebastian
Published in (Monograph or Journal): CSRM Methods Series
Publisher: Centre for Social Research and Methods
Year published: 2018
Issue no.: 4/2018

Abstract

A universal global measure of univariate and bivariate data utility for anonymised microdata

This paper presents a new global data utility measure, based on a benchmarking approach. Data utility measures assess the utility of anonymised microdata by measuring changes in distributions and their impact on bias, variance and other statistics derived from the data. Most existing data utility measures have significant shortcomings – that is, they are limited to continuous variables, to univariate utility assessment, or to local information loss measurements. Several solutions are presented in the proposed global data utility model. It combines univariate and bivariate data utility measures, which calculate information loss using various statistical tests and association measures, such as two-sample Kolmogorov–Smirnov test, chi-squared test (Cramer’s V), ANOVA F test (eta squared), Kruskal-Wallis H test (epsilon squared), Spearman coefficient (rho) and Pearson correlation coefficient (r). The model is universal, since it also includes new local utility measures for global recoding and variable removal data reduction approaches, and it can be used for data protected with all common masking methods and techniques, from data reduction and data perturbation to generation of synthetic data and sampling. At the bivariate level, the model includes all required data analysis steps: assumptions for statistical tests, statistical significance of the association, direction of the association and strength of the association (size effect).

Since the model should be executed automatically with statistical software code or a package, our aim was to allow all steps to be done with no additional user input. For this reason, we propose approaches to automatically establish the direction of the association between two variables using test-reported standardised residuals and sums of squares between groups.

Although the model is a global data utility model, individual local univariate and bivariate utility can still be assessed for different types of variables, as well as for both normal and non-normal distributions. The next important step in global data utility assessment would be to develop either program code or an R statistical software package for measuring data utility, and to establish the relationship between univariate, bivariate and multivariate data utility of anonymised data.

Keywords: statistical disclosure control, data utility, information loss, distribution estimation, bivariate analysis, effect size.

File attachments

AttachmentSize
CSRM_MP4_2018_ANONYMISED_MICRODATA.pdf(1.11 MB)1.11 MB